Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Revamp CrossValidationReport to use EstimatorReport #1091

Merged
merged 107 commits into from
Jan 20, 2025

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Jan 10, 2025

closes #731
closes #860
closes #938
closes #978
closes #976
closes #977
closes #953
closes #1012
closes #1014

It is a revamp from the current CrossValidationReporter. I am working in another module such that:

  • CrossValidationReporter is still available such that we do the transition for the UI part
  • move CrossValidationReport to a private module since we want to encourage people to import either from skore or skore.sklearn.

There is quick demo here: https://gist.github.com/glemaitre/1216d2098e3f92e045fb064b9647f3f0

TODO

Copy link
Contributor

github-actions bot commented Jan 10, 2025

Documentation preview @ d46199e

Copy link
Contributor

github-actions bot commented Jan 10, 2025

Coverage

Coverage Report for backend
FileStmtsMissCoverMissing
venv/lib/python3.12/site-packages/skore
   __init__.py140100% 
   __main__.py8180%19
   exceptions.py30100% 
venv/lib/python3.12/site-packages/skore/cli
   __init__.py50100% 
   cli.py33385%104, 111, 117
   color_format.py43390%35–>40, 41–43
   launch_dashboard.py261539%36–57
   quickstart_command.py14750%37–51
venv/lib/python3.12/site-packages/skore/item
   __init__.py21191%46
   cross_validation_item.py1371093%27–42, 373
   item.py421369%91, 94, 98–118
   item_repository.py65592%12–13, 211–212, 235
   media_item.py70494%15–18
   numpy_array_item.py25193%15
   pandas_dataframe_item.py34195%15
   pandas_series_item.py34195%15
   polars_dataframe_item.py32194%15
   polars_series_item.py27194%15
   primitive_item.py27292%13–15
   sklearn_base_estimator_item.py33195%15
   skrub_table_report_item.py10186%11
venv/lib/python3.12/site-packages/skore/persistence
   __init__.py00100% 
   abstract_storage.py22195%130
   disk_cache_storage.py33195%44
   in_memory_storage.py200100% 
venv/lib/python3.12/site-packages/skore/project
   __init__.py30100% 
   create.py52888%116–122, 132–133, 140–141
   load.py23389%43–45
   open.py140100% 
   project.py68394%118, 152, 156
venv/lib/python3.12/site-packages/skore/sklearn
   __init__.py50100% 
   _base.py140497%91, 94, 168–>173, 183–184
   find_ml_task.py45196%71–>87, 86
   types.py20100% 
venv/lib/python3.12/site-packages/skore/sklearn/_cross_validation
   __init__.py100100% 
   metrics_accessor.py1630100% 
   report.py870100% 
venv/lib/python3.12/site-packages/skore/sklearn/_estimator
   __init__.py100100% 
   metrics_accessor.py265497%149–158, 183–>236, 191, 434–>437, 783–>786
   report.py121099%216–>222, 224–>226
   utils.py11110%1–19
venv/lib/python3.12/site-packages/skore/sklearn/_plot
   __init__.py40100% 
   precision_recall_curve.py121198%229–>246, 317
   prediction_error.py970100% 
   roc_curve.py1280100% 
   utils.py840100% 
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation
   __init__.py20100% 
   cross_validation_helpers.py47490%104–>136, 123–126
   cross_validation_reporter.py35195%177
venv/lib/python3.12/site-packages/skore/sklearn/cross_validation/plots
   __init__.py00100% 
   compare_scores_plot.py29192%10, 45–>48
   timing_plot.py29194%10
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split
   __init__.py00100% 
   train_test_split.py36294%16–17
venv/lib/python3.12/site-packages/skore/sklearn/train_test_split/warning
   __init__.py80100% 
   high_class_imbalance_too_few_examples_warning.py17378%16–18, 80
   high_class_imbalance_warning.py18288%16–18
   random_state_unset_warning.py11187%15
   shuffle_true_warning.py9091%44–>exit
   stratify_is_set_warning.py11187%15
   time_based_column_warning.py22189%17, 69–>exit
   train_test_split_warning.py5180%21
venv/lib/python3.12/site-packages/skore/ui
   __init__.py00100% 
   app.py25571%24, 53–58
   dependencies.py7186%12
   project_routes.py610100% 
venv/lib/python3.12/site-packages/skore/utils
   __init__.py00100% 
   _accessor.py70100% 
   _logger.py21484%14–18
   _patch.py11546%19–35
   _progress_bar.py280100% 
   _show_versions.py310100% 
venv/lib/python3.12/site-packages/skore/view
   __init__.py00100% 
   view.py50100% 
   view_repository.py16283%8–9
TOTAL272214394% 

Tests Skipped Failures Errors Time
573 3 💤 0 ❌ 0 🔥 1m 16s ⏱️

@glemaitre glemaitre marked this pull request as draft January 10, 2025 19:22
@augustebaum
Copy link
Contributor

Side-note: Was it intentional to link to PR 538?

@glemaitre
Copy link
Member Author

Side-note: Was it intentional to link to PR 538?

Nop, it seems wrong. There might be a PR in the 500'ish to be close then :)

Copy link
Contributor

@augustebaum augustebaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apart from the caching question I just have a few suggestions.
Also, I guess the feature where we save intermediate data (e.g. save after evaluating each split) is pushed back to later?

skore/src/skore/sklearn/_cross_validation/__init__.py Outdated Show resolved Hide resolved
skore/src/skore/sklearn/_estimator/__init__.py Outdated Show resolved Hide resolved
skore/src/skore/sklearn/_estimator/report.py Outdated Show resolved Hide resolved
@glemaitre
Copy link
Member Author

glemaitre commented Jan 17, 2025

Also, I guess the feature where we save intermediate data (e.g. save after evaluating each split) is pushed back to later?

So here, the data are stored in the estimator_report that is a generator, so we should be on the path. If the idea, is to make an update on the reporter shown in the activity feed, we would an interuption when consuming the reporter and trigger partial computation of the view. But we have all the primitive to do it I think

Copy link
Contributor

github-actions bot commented Jan 17, 2025

Coverage Report for frontend

Status Category Percentage Covered / Total
🔵 Lines 85.49% 3577 / 4184
🔵 Statements 85.49% 3577 / 4184
🔵 Functions 44.85% 61 / 136
🔵 Branches 79.15% 205 / 259
File Coverage
File Stmts Branches Functions Lines Uncovered Lines
Changed Files
skore-ui/src/components/ProjectViewCard.vue 100% 100% 100% 100%
skore-ui/src/components/TreeAccordionItem.vue 83.96% 90% 20% 83.96% 23-27, 39-49, 59-60
Generated in workflow #94 for commit 25bb9ba by the Vitest Coverage Report Action

@glemaitre
Copy link
Member Author

The comment should be addressed @augustebaum @sylvaincom

Copy link
Contributor

@augustebaum augustebaum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Many thanks

@augustebaum augustebaum merged commit a3edbc3 into probabl-ai:main Jan 20, 2025
13 of 18 checks passed
MarieS-WiMLDS added a commit that referenced this pull request Jan 22, 2025
…rt (#1156)

Build upon #1091
We should merge #1091 to make it
easy to review.

This PR revisits the documentation for the `EstimatorReport` and
`CrossValidationReport`.

The idea is to provide different examples for different purposes:

- technical section: it is a section to explain more in details some
internal for user that might be interested. I think that we should move
those inside this specific section because we should have more a pure
data science section to only show the added values of our tools on data
science
- use-case section: it is indeed the section to show case our tools on
real data. Here we want to show the added value on the data science
side.

---------

Co-authored-by: Sylvain Combettes <[email protected]>
Co-authored-by: Auguste Baum <[email protected]>
Co-authored-by: Matt J. <[email protected]>
Co-authored-by: Thomas S. <[email protected]>
Co-authored-by: MarieS-WiMLDS <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment